50 research outputs found

    Exploration via Elliptical Episodic Bonuses

    Get PDF
    In recent years, a number of reinforcement learning (RL) methods have been proposed to explore complex environments which differ across episodes. In this work, we show that the effectiveness of these methods critically relies on a count-based episodic term in their exploration bonus. As a result, despite their success in relatively simple, noise-free settings, these methods fall short in more realistic scenarios where the state space is vast and prone to noise. To address this limitation, we introduce Exploration via Elliptical Episodic Bonuses (E3B), a new method which extends count-based episodic bonuses to continuous state spaces and encourages an agent to explore states that are diverse under a learned embedding within each episode. The embedding is learned using an inverse dynamics model in order to capture controllable aspects of the environment. Our method sets a new state-of-the-art across 16 challenging tasks from the MiniHack suite, without requiring task-specific inductive biases. E3B also matches existing methods on sparse reward, pixel-based Vizdoom environments, and outperforms existing methods in reward-free exploration on Habitat, demonstrating that it can scale to high-dimensional pixel-based observations and realistic environments

    Dungeons and Data: A Large-Scale NetHack Dataset

    Get PDF
    Recent breakthroughs in the development of agents to solve challenging sequential decision making problems such as Go [50], StarCraft [58], or DOTA [3], have relied on both simulated environments and large-scale datasets. However, progress on this research has been hindered by the scarcity of open-sourced datasets and the prohibitive computational cost to work with them. Here we present the NetHack Learning Dataset (NLD), a large and highly-scalable dataset of trajectories from the popular game of NetHack, which is both extremely challenging for current methods and very fast to run [23]. NLD consists of three parts: 10 billion state transitions from 1.5 million human trajectories collected on the NAO public NetHack server from 2009 to 2020; 3 billion state-action-score transitions from 100,000 trajectories collected from the symbolic bot winner of the NetHack Challenge 2021; and, accompanying code for users to record, load and stream any collection of such trajectories in a highly compressed form. We evaluate a wide range of existing algorithms including online and offline RL, as well as learning from demonstrations, showing that significant research advances are needed to fully leverage large-scale datasets for challenging sequential decision making tasks

    The NetHack learning environment

    Get PDF
    Progress in Reinforcement Learning (RL) algorithms goes hand-in-hand with the development of challenging environments that test the limits of current methods. While existing RL environments are either sufficiently complex or based on fast simulation, they are rarely both. Here, we present the NetHack Learning Environment (NLE), a scalable, procedurally generated, stochastic, rich, and challenging environment for RL research based on the popular single-player terminal-based roguelike game, NetHack. We argue that NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL, while dramatically reducing the computational resources required to gather a large amount of experience. We compare NLE and its task suite to existing alternatives, and discuss why it is an ideal medium for testing the robustness and systematic generalization of RL agents. We demonstrate empirical success for early stages of the game using a distributed Deep RL baseline and Random Network Distillation exploration, alongside qualitative analysis of various agents trained in the environment. NLE is open source and available at https://github.com/facebookresearch/nle

    PIDT: A Novel Decision Tree Algorithm Based on Parameterised Impurities and Statistical Pruning Approaches

    Get PDF
    In the process of constructing a decision tree, the criteria for selecting the splitting attributes influence the performance of the model produced by the decision tree algorithm. The most well-known criteria such as Shannon entropy and Gini index, suffer from the lack of adaptability to the datasets. This paper presents novel splitting attribute selection criteria based on some families of pa-rameterised impurities that we proposed here to be used in the construction of optimal decision trees. These criteria rely on families of strict concave functions that define the new generalised parameterised impurity measures which we ap-plied in devising and implementing our PIDT novel decision tree algorithm. This paper proposes also the S-condition based on statistical permutation tests, whose purpose is to ensure that the reduction in impurity, or gain, for the selected attrib-ute is statistically significant. We implemented the S-pruning procedure based on the S-condition, to prevent model overfitting. These methods were evaluated on a number of simulated and benchmark datasets. Experimental results suggest that by tuning the parameters of the impurity measures and by using our S-pruning method, we obtain better decision tree classifiers with the PIDT algorithm

    Insights from the NeurIPS 2021 NetHack Challenge

    Get PDF
    In this report, we summarize the takeaways from the first NeurIPS 2021 NetHack Challenge. Participants were tasked with developing a program or agent that can win (i.e., ‘ascend’ in) the popular dungeon-crawler game of NetHack by interacting with the NetHack Learning Environment (NLE), a scalable, procedurally generated, and challenging Gym environment for reinforcement learning (RL). The challenge showcased community-driven progress in AI with many diverse approaches significantly beating the previously best results on NetHack. Furthermore, it served as a direct comparison between neural (e.g., deep RL) and symbolic AI, as well as hybrid systems, demonstrating that on NetHack symbolic bots currently outperform deep RL by a large margin. Lastly, no agent got close to winning the game, illustrating NetHack’s suitability as a long-term benchmark for AI research

    COST292 experimental framework for TRECVID 2008

    Get PDF
    In this paper, we give an overview of the four tasks submitted to TRECVID 2008 by COST292. The high-level feature extraction framework comprises four systems. The first system transforms a set of low-level descriptors into the semantic space using Latent Semantic Analysis and utilises neural networks for feature detection. The second system uses a multi-modal classifier based on SVMs and several descriptors. The third system uses three image classifiers based on ant colony optimisation, particle swarm optimisation and a multi-objective learning algorithm. The fourth system uses a Gaussian model for singing detection and a person detection algorithm. The search task is based on an interactive retrieval application combining retrieval functionalities in various modalities with a user interface supporting automatic and interactive search over all queries submitted. The rushes task submission is based on a spectral clustering approach for removing similar scenes based on eigenvalues of frame similarity matrix and and a redundancy removal strategy which depends on semantic features extraction such as camera motion and faces. Finally, the submission to the copy detection task is conducted by two different systems. The first system consists of a video module and an audio module. The second system is based on mid-level features that are related to the temporal structure of videos

    Platform session

    Get PDF

    Focusing and guiding charged particles using two-layer superconducting tubes: A theoretical approach

    No full text
    A theoretical model of self-focusing of an impulse electron beam in a two-layered superconducting tube is designed. The self-focusing force is calculated and, by defining the pair breaking time of the superconducting pair, it is shown that this is conditioned among others by the ratio of the beam pulse period to the pair breaking time. The model is validated by evaluating both the breaking time and the focusing length

    Характер і динаміка гуморальних імунних порушень за перебігу гострої кишкової непрохідності внаслідок защемлення гриж

    No full text
    A study of immune reactivity in 120 patients with strangulated hernia depending on the presence (I group patients) or no signs of acute intestinal obstruction (II group patients) has been conducted. It was found that changes in the humoral immunity were less pronounced and characterized by a decrease in the number of immunoglobulins G and A on the third day of postoperative period and may be associated not only with quantitative and functional deficiency of B-cells, but with insufficient activity of cytokines, secreted by T-cells. The obtained data lead to the conclusion that in strangulated hernia complicated with acute intestinal obstruction there are significant disorders in the system of immunological reactivity,which are manifested themselves with a decline in both cellular and humoral immunity and with an increased activity of nonspecific immunological reactivity. These changes are more profound and persistent than in patients with uncomplicated course of strangulated hernia. Immunosuppression is most pronounced in the 1st and 3rd postoperative day and with the patients’ clinical recovering, in positive course of the disease, the studied parameters get gradually normalized, but, at the same time even up to the 7th postoperative day they do not reach control values.Проведено изучение иммунологической реактивности в 120 больных с ущемленными грижами в зависимости от наличия (I группа больных) или отсутствия признаков острой кишечной непроходимости (II группа больных). Выявлено, что изменения со стороны гуморального звена иммунитета характеризовалась снижением количества иммуноглобулинов класса G и А на третьи сутки послеоперационного периода, что связано не только с количественным и функциональным дефицитом В-клеток, но и слабой активностью цитокинов, которые синтезируются Т-клетками. Полученные данные свидетельствуют о том, что при осложненном течении острой кишечной непроходимостью ущемленных грыж отмечаются значительные нарушення в системе иммунологической реактивности организма, которые проявляются снижением показателей как клеточного, так и гуморального звена иммунитета и повышением активности системы неспецифической иммунологической реактивности организма. Такие изменения более глубокие и стойкие, чем у больных с неосложненным течением ущемленных грыж. Иммуносупрессия найболее выражена на 1-ые и 3-ьи сутки послеоперационного периода и по мере клинического выздоровления больных, при положительном течении заболевания, исследуемые показатели медленно нормализуются, но при этом даже к 7-ым суткам послеоперационного течения контрольных величин не достигают.Проведено вивчення імунологічної реактивності в 120 хворих на защемлені грижі залежно від наявності (I група пацієнтів) або відсутності ознак гострої кишкової непрохідності (II група пацієнтів). Виявлено, що зміни з боку гуморальної ланки імунітету характеризуються зниженням кількості імуноглобулінів класів G та А на третю добу післяопераційного періоду, що пов’язано не тільки з кількісним і функціональним дефіцитом В-клітин, але й з недостатньою активністю цитокінів, які виділяються Т-клітинами. Отримані дані дозволяють дійти висновку про те, що за ускладненого перебігу гострою кишковою непрохідністю защемлених гриж відбуваються значні порушення в системі імунологічної реактивності організму, які проявляються зниженням показників як клітинної, так і гуморальної ланки імунітету та підвищенням активності системи неспецифічної імунологічної реактивності організму. Дані зміни більш глибокі та стійкі, ніж у пацієнтів з неускладненим перебігом защемлених гриж. Імуносупресія найбільш виражена на 1-шу та 3-тю добу післяопераційного періоду і в міру клінічного одужання хворих, за позитивного перебігу захворювання, досліджувані показники поступово нормалізуються і, навіть до 7-ї доби післяопераційного періоду контрольних величин не досягають
    corecore